feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

SoumyaRaikwar · 2025-08-10T22:30:32Z

What this PR does / why we need it

This PR adds the kube_deployment_spec_topology_spread_constraints metric that counts the number of topology spread constraints defined in a deployment's pod template specification.

This PR solves the topology spread constraints monitoring requirement from issue #2701, which specifically requested visibility into scheduling primitives including "pod topology spread constraints" for workload pod distribution monitoring.

Which issue(s) this PR fixes

Solves topology spread constraints monitoring from #2701 - Add schedule spec and status for workload

Problem Solved

Issue #2701 identified that operators need to monitor various scheduling primitives to detect when "break variation may happen because pod priority preemption or node pressure eviction."

mrueg · 2025-08-13T08:08:02Z

How would you use this metric for alerting or to provide info about the deployment?

SoumyaRaikwar · 2025-08-13T18:19:15Z

How would you use this metric for alerting or to provide info about the deployment?

These topology spread constraint metrics enable critical alerting on workload distribution policies: kube_deployment_spread_topology_constraint_metric > 0 helps detect when spread constraints exist but workloads become unevenly distributed across zones/nodes during resource pressure.

You can alert on missing distribution policies with (kube_deployment_spec_replicas > 1) and (kube_deployment_spread_topology_constraint_metric == 0) to identify multi-replica deployments lacking proper spread configuration.

For dashboards, count(kube_deployment_spread_topology_constraint_metric > 0) shows cluster-wide adoption of topology spread policies, complementing the pod affinity/anti-affinity metrics I implemented in PR #2733.

During incidents, these metrics help correlate why workloads became concentrated in specific topology domains or why pods failed to schedule due to overly restrictive spread policies.

This completes the scheduling observability suite from issue #2701 - together with my pod affinity/anti-affinity metrics (PR #2733), operators now have full visibility into both co-location/separation rules AND even distribution policies across cluster topology. Thanks @mrueg!

mrueg · 2025-08-14T18:59:05Z

Same comment as in the other PR #2733 applies here, we should have explicit metrics per kube_deployment_topology_spread_constraint{} and not simply count a length.

SoumyaRaikwar · 2025-08-19T07:32:41Z

Add explicit topology spread constraint metric for deployments
Replaces kube_deployment_spec_topology_spread_constraints (count-only) with kube_deployment_spec_topology_spread_constraint that exposes detailed labels per constraint: topology_key, max_skew, when_unsatisfiable, min_domains, and label_selector.

This enables precise Prometheus queries and better observability of constraint configurations across deployments.

Thanks @mrueg for suggesting to make this metric explicit! ✓ Tests updated ✓ Documentation updated

rexagod · 2025-09-04T16:54:29Z

/assign @mrueg
/triage accepted

k8s-ci-robot · 2025-09-07T06:47:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SoumyaRaikwar
Once this PR has been reviewed and has the lgtm label, please ask for approval from mrueg. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

SoumyaRaikwar · 2025-09-11T21:54:06Z

Hi @mrueg,
When you have a moment, could you please review my recent PR?

- Adds new metric to count topology spread constraints in deployment pod templates - Includes comprehensive test coverage for both cases (with/without constraints) - Follows existing patterns and stability guidelines

…umentation

…tation files

- Add document start markers (---) to all YAML files - Fix indentation errors throughout manifests - Resolve line length violations - Correct YAML list formatting - Update image name to remove -amd64 suffix

- Aligns with project standards and CI expectations - Resolves validate-manifests target failures - Ensures consistency with automated generation workflow

- Fix formatting issues in deployment.go - Update deployment_test.go with proper topology spread constraint tests - Add missing deletion timestamp metric documentation - Fix deployment status condition reason labels in tests - Ensure all tests pass with proper metric values

SoumyaRaikwar · 2025-09-19T14:10:13Z

Hi @mrueg,
When you have a moment, could you please review my PR?

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 10, 2025

k8s-ci-robot requested review from logicalhan and rexagod August 10, 2025 22:30

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 10, 2025

SoumyaRaikwar changed the title ~~Add kube_deployment_spec_topology_spread_constraints metric for issue #2701~~ feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 Aug 10, 2025

SoumyaRaikwar mentioned this pull request Aug 13, 2025

Add schedule spec and status for workload #2701

Open

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 14, 2025

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 19, 2025

rexagod added this to SIG Instrumentation Aug 26, 2025

github-project-automation bot moved this to Needs Triage in SIG Instrumentation Aug 26, 2025

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2025

SoumyaRaikwar force-pushed the add-deployment-topology-spread-constraints-metric branch from a96beb0 to 679d205 Compare August 30, 2025 22:51

k8s-ci-robot assigned mrueg Sep 4, 2025

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 4, 2025

rexagod moved this from Needs Triage to Needs Review (PR) or Response (Issue) in SIG Instrumentation Sep 4, 2025

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 7, 2025

SoumyaRaikwar added 11 commits September 19, 2025 19:02

Add kube_deployment_spec_topology_spread_constraints metric

6bc751d

- Adds new metric to count topology spread constraints in deployment pod templates - Includes comprehensive test coverage for both cases (with/without constraints) - Follows existing patterns and stability guidelines

docs: Add kube_deployment_spec_topology_spread_constraints metric doc…

b8fd9cf

…umentation

docs: Add kube_deployment_spec_topology_spread_constraints to documen…

b042779

…tation files

Fix deployment metrics documentation with reason label

d21d4ba

updated deployment-metrics.md

01d7863

corrected intendation in deployment-metrics.md

b98a34c

done

da0c0e5

Fix YAML formatting and indentation in manifest files

aec2fec

- Add document start markers (---) to all YAML files - Fix indentation errors throughout manifests - Resolve line length violations - Correct YAML list formatting - Update image name to remove -amd64 suffix

Regenerate manifests with canonical formatting from Jsonnet

91e8854

- Aligns with project standards and CI expectations - Resolves validate-manifests target failures - Ensures consistency with automated generation workflow

Regenerate manifests to canonical format for validation

9f42507

SoumyaRaikwar force-pushed the add-deployment-topology-spread-constraints-metric branch from 623acad to 71b908f Compare September 19, 2025 14:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

Uh oh!

SoumyaRaikwar commented Aug 10, 2025

Uh oh!

mrueg commented Aug 13, 2025

Uh oh!

SoumyaRaikwar commented Aug 13, 2025

Uh oh!

mrueg commented Aug 14, 2025

Uh oh!

SoumyaRaikwar commented Aug 19, 2025

Uh oh!

rexagod commented Sep 4, 2025

Uh oh!

k8s-ci-robot commented Sep 7, 2025

Uh oh!

SoumyaRaikwar commented Sep 11, 2025

Uh oh!

SoumyaRaikwar commented Sep 19, 2025

Uh oh!

Uh oh!

feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

Are you sure you want to change the base?

feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

Uh oh!

Conversation

SoumyaRaikwar commented Aug 10, 2025

What this PR does / why we need it

Which issue(s) this PR fixes

Problem Solved

Uh oh!

mrueg commented Aug 13, 2025

Uh oh!

SoumyaRaikwar commented Aug 13, 2025

Uh oh!

mrueg commented Aug 14, 2025

Uh oh!

SoumyaRaikwar commented Aug 19, 2025

Uh oh!

rexagod commented Sep 4, 2025

Uh oh!

k8s-ci-robot commented Sep 7, 2025

Uh oh!

SoumyaRaikwar commented Sep 11, 2025

Uh oh!

SoumyaRaikwar commented Sep 19, 2025

Uh oh!

Uh oh!